Clinical relevance of somatic mutations in Chinese lung adenocarcinoma and their prognostic implications for survival

Abstract Background To comprehensively elucidate the genomic and mutational features of lung cancer cases, and lung adenocarcinoma (LUAD), it is imperative to conduct ongoing investigations into the genomic landscape. In this study, we aim to analyze the somatic mutation profile and assessed the significance of these informative genes utilizing a retrospective LUAD cohort. Methods A total of 247 Chinese samples were analyzed to exhibit the tumor somatic genomic alterations in patients with LUAD. The Cox regression analysis was employed to identify prognosis‐related genes and establish a predictive model for stratifying patients with LUAD. Results In the Dianjiang People's Hospital (DPH) cohort, the top five frequent mutated genes were (Epidermal growth factor receptor) EGFR (68%), TP53 (30%), RBM10 (13%), LRP1B (9%), and KRAS (9%). Of which, EGFR is a mostly altered driver gene, and most mutation sites are located in tyrosine kinase regions. Oncogene pathway alteration and mutation signature analysis demonstrated the RTK‐RAS pathway alteration, and smoking was the main carcinogenic factor of the DPH cohort. Furthermore, we identified 34 driver genes in the DPH cohort, including EGFR (68%), TP53 (30.4%), RBM10 (12.6%), KRAS (8.5%), LRP1B (8.5%), and so on, and 45 Clinical Characteristic‐Related Genes (CCRGs) were found to closely related to the clinical high‐risk factors. We developed a Multiple Parameter Gene Mutation (MPGM) risk model by integrating critical genes and oncogenic pathway alterations in LUAD patients from the DPH cohort. Based on publicly available LUAD datasets, we identified five genes, including BRCA2, Anaplastic lymphoma kinase (ALK), BRAF, EGFR, and Platelet‐Derived Growth Factor Receptor Alpha (PDGFRA), according to the multivariable Cox regression analysis. The MPGM‐low group showed significantly better overall survival (OS) compared to the MPGM‐high group (p < 0.0001, area under the curve (AUC) = 0.754). The robust performance was validated in 55 LUAD patients from the DPH cohort and another LUAD dataset. Immune characteristics analysis revealed a higher proportion of primarily DCs and mononuclear cells in the MPGM‐low risk group, while the MPGM‐high risk group showed lower immune cells and higher tumor cell infiltration. Conclusion This study provides a comprehensive genomic landscape of Chinese LUAD patients and develops an MPGM risk model for LUAD prognosis stratification. Further follow‐up will be performed for the patients in the DPH cohort consistently to explore the resistance and prognosis genetic features.


| INTRODUCTION
As is reported in Global cancer data, lung cancer is the leading cause of cancer-related death. 1 Based on the histological classification, non-small cell lung cancer (NSCLC) accounts for approximately 85% of lung cancer cases, and lung adenocarcinoma (LUAD) is the most prevalent subtype of NSCLC. 2 Multiple studies and expert consensus have furnished compelling evidence linking the survival of cancer patients to clinical and pathological characteristics, including but not limited to age, clinical stage, smoking status, and family history, among others. 3Tu et al. prospectively studied the association of smoking characteristics with all-cause mortality in 23 cancer types.They proved the significant harms of current smoking on survival after a cancer diagnosis.In lung cancer, current smokers and former smokers were at greater risk of death. 4,5Guerreiro et al. proposed that with the increasing age (>50) and clinical stage at diagnosis, relative survival decreased progressively based on a large population study. 5,6In addition, tumor spread through airspaces (STAS) and vascular invasion is aggressive clinical pathology characteristics, independent poor prognosis factors for recurrence-free survival in NSCLC according to the multivariate Cox regression analysis. 7,8bundant clinical features and prognostic biomarkers are critical in guiding precise therapy decisions and improving survival outcomes.However, most studies primarily focus on the prognostic significance of these clinical features, with limited evidence regarding the association between somatic tumor alterations and clinicopathologic features.Therefore, there is an urgent need for robust genomic evidence to unravel the molecular mechanisms linking gene alterations to clinicopathologic characteristics in LUADs.
With continual advancement and iteration of nextgeneration sequencing (NGS) technology, abundant driver gene mutations are identified, leading to a more comprehensive genomic landscape of LUAD. 9 Several studies have verified that tumor genome alterations are closely related to cancer occurrence and development.In addition, driver gene and oncogenic pathway alterations could also provide effective targeted therapy opportunities. 10The receptor tyrosine kinase (RTK)/RAS/RAF pathway genes are frequently mutated in LUAD. 11Recurrently, somatic genomic alterations in genes such as EGFR, KRAS, BRAF, and others have emerged as conventional therapeutic targets, leading to revolutionary improvements in clinical outcomes. 12evertheless, targeted therapy's vulnerability, characterized by the emergence of recurrent drug resistance and adverse reactions, highlights the imperative to gain a deeper understanding of tumor-specific somatic alterations and the underlying molecular mechanisms in LUAD.Despite the significant advancements in clinical trials for targeted therapy and immunotherapy, which have improved clinical outcomes for patients with LUAD, the practical implementation of these effective treatment options remains restricted to a minority of patients.Notably, the 5-year survival rate for patients with advanced clinical stages (III-IV) remains below 60%. 1,13Significant heterogeneity was observed among LUAD patients in histological subtypes and in genomic and cellular characteristics. 14herefore, we aim to develop a prognosis risk model based on mutation characteristics to enable risk stratification for patients, mainly when treatment options are restricted.
In this study, we comprehensively analyzed genomic mutation characteristics in a Chinese LUAD cohort.We investigated the genes with significantly different mutations associated with clinical pathology risk factors, potentially influencing the development of specific clinical and pathological characteristics.Subsequently, a comprehensive screening was performed, incorporating these (AUC) = 0.754).The robust performance was validated in 55 LUAD patients from the DPH cohort and another LUAD dataset.Immune characteristics analysis revealed a higher proportion of primarily DCs and mononuclear cells in the MPGM-low risk group, while the MPGM-high risk group showed lower immune cells and higher tumor cell infiltration.

Conclusion:
This study provides a comprehensive genomic landscape of Chinese LUAD patients and develops an MPGM risk model for LUAD prognosis stratification.Further follow-up will be performed for the patients in the DPH cohort consistently to explore the resistance and prognosis genetic features.genes and hotspot mutations, driver gene alterations, and oncogenic pathway genes to identify a gene set that exhibits the strongest correlation with patient survival prognosis in LUAD.Leveraging these patient-specific genomic alterations, we developed a predictive prediction model to assist clinicians in patient risk stratification.Our study provides new insight into the genomic alterations and prognosis stratification for LUAD.

| Data
The Dianjiang People's Hospital (DPH) cohort comprised 247 LUAD patients.The baseline information of patients in the DPH cohort was in Table S1, and the somatic mutation data were in Table S2.
The EAS cohort 15 included 299 Chinese patients with LUAD and was used as a training cohort to develop the prognosis risk model.The corresponding clinical information and the TOP100 mutation genes were summarized in Tables S3 and S4, respectively.The RNA-seq data of 169 patients with LUAD from the EAS cohort were also acquired from cBioPortal and published in the previous article.
The validation cohort (MSK-LUAD) 16 comprised 394 LUAD samples obtained from an Asian population within the MSK-MET dataset (Memorial Sloan Kettering-Metastatic Events and Tropisms, containing 3853), which was utilized to validate the predictive performance of the risk model.The mutation and corresponding clinical data of EAS and MSK-LUAD cohorts were obtained from the cBioPortal (https:// www.cbiop ortal.org/ ).The clinical information and the TOP100 mutation genes were comprehensively summarized in Tables S5 and S6, respectively.
Among 247 LUAD patients from the DPH cohort, 55 were treated with targeted or/chemotherapy /and immunotherapy and participated in regular follow-up examinations.The detailed prognosis and therapy (drug type and adverse reaction) information are in Table S7.The last follow-up was performed on 2nd June 2023, and progressionfree survival (PFS) was used as the survival endpoint.

| Targeted sequencing
The formalin-fixed paraffin-embedded (FFPE) sample of the DPH cohort was profiled with targeting sequencing (1123 gene panel) (ChosenMed, Beijing, China), and germline variation was synchronously sequenced using the paired adjacent tissues or peripheral blood samples.DNA extraction from tumor specimens was performed using the Maxwell RSC FFPE Plus DNA Kit (Promega, Cat no.AS1720), followed by shearing of genomic DNA (gDNA) into approximately 200 bp fragments using a Covaris E210 system (Covaris, Inc.).Construction of the NGS library was accomplished using the KAPA HyperPrep Kit (Roche, 07962312001) and the Agilent SureSelect XT kit (Agilent, G9702C).The library's quantity was assessed with the Qubit 3.0 Fluorometer (Life Technologies, Inc.), while its quality and fragment size were determined using an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc.).Finally, targeted sequencing of the gDNA was conducted, employing paired-end sequencing on an Illumina NovaSeq 6000 platform (Illumina Inc) with a 150-bp read length.

| Data analysis
Somatic mutations were identified by aligning the human reference genome (UCSC hg19) using the Burrows-Wheeler Aligner (BWA version 0.7.11) and subsequently calling mutations with the IndelRealigner tool from the Genome Analysis Toolkit (GATK, version 3.6) 17 and VarScan software. 18Annotation was performed using ANNOVAR, and variants were filtered based on several criteria: exclusion of intronic variants, absence in more than 1% of the 1000 Genomes Project population, absence in dbSNP common Single Nucleotide Polymorphism (SNPs), exclusion of synonymous variants, removal of variants with <50 supporting reads, elimination of mutations flagged as "black flag," and ensuring tumor cell content ≥10%, with an average sequencing depth ≥ 2000×, and DNA insert fragment size ≥140 bp.

| Genomic alterations analysis
Tumor mutational burden (TMB) was determined by counting all nonsynonymous mutations and indels per megabase (MB) of genomic alterations.A threshold of 10 mutations per MB was used to classify tumors as either TMB-high or TMB-low.Driver genes of the EAS LUAD cohort were identified using MutSigCV 19 method.

| Development of the risk model
Multiple parameter gene mutations (MPGM) were screened with mutation frequency >0% in the training cohort, and the remaining candidate genes were further screened utilizing univariate and multivariable Cox regression analysis.The multivariable Cox regression was performed to develop the MPGM risk model, and the risk scores were calculated using the following formula: The mutation status was defined as follows: "0" denoted the absence of any mutations, while "1" indicated the presence of mutations.The median cutoff value was set to stratify into MPGM-high and MPGM-how groups.

| Immune infiltrating analysis of risk model
The relative abundance of immune infiltrating cells was analyzed using a different algorithm, including Cibersort, 20 Timer, 21 Quantiseq, 22 and MCPcounter. 23Among them, the result from Cibersort provided a relative abundance of immune cells.EPIC was used to estimate the ratio of immune cells to cancer cells. 24The estimate method was utilized to predict tumor purity and stroma abundance and assess the immune cell invasion level. 25Additionally, Xcell was employed to evaluate the relative enrichment level of immune cells within the tumor tissue. 26

| Statistical analysis
Fisher's exact test (two-sided) was used to assess the correlation between clinical pathology features and somatic mutations.Utilizing the chi-square test, we analyzed differences in the smoking population between two mutation signature groups.Two-sided Wilcoxon rank sum testing was used between two continuous variables, such as TMB level.Kaplan-Meier survival curves and Cox regression (hazard ratio [HR] and 95% confidence interval [CI]) for survival variables were assessed for the OS and PFS estimations.p < 0.05 or FDR < 0.05 were considered statistically significant.The performance of the risk model was assessed by calculating the area under the curve (AUC) of the ROC curve.Statistical analyses were conducted using R software (version 4.1.2).

| Clinical characters of DPH cohort
We retrospectively enrolled 247 Chinese LUAD patients who sought treatment at the Dianjiang People's Hospital (the "DPH cohort," N = 247).We collected comprehensive clinicopathologic features of patients, including age, sex, tumor size, smoking status, body mass index (BMI) index, family history of cancer, ki67 expression, stage, histological classification, histological grade, STAS, lymph node metastasis, vascular invasion, TMB value, and so on, to assess its correlation with genome and survival prognosis.All the tumor tissues of LUAD patients in the DPH cohort underwent an 1123 gene panel target sequencing (ChosenMed, Beijing, China).

| Identification of a somatic mutation in LUAD of DPH cohort
We performed a strict filtering standard to screen truthful mutation sites.A total of 1419 single-nucleotide variants and 101 insertions/deletions (indels) were called (see "Section 2" and Table S2).The mutation landscape of the top 20 frequently mutant genes and corresponding clinical pathology information is shown in Figure 1A.The lollipop plot demonstrated the mutation position and types of 20 genes (Figure S1).Based on the TMB cutoff value of 10 mutation/Mb, 27 LUAD (10.93%) patients were classified in the TMB-H group.The median TMB value of the DPH cohort was 3.43 mutation/Mb, and the median TMB of smokers and non-smokers were 3.95 and 3.29 mutation/Mb, respectively (Table S1).
This study defined Clinical Characteristic-Related Genes (CCRGs) as genes frequently mutated in individuals with higher clinical risk factors.These risk factors include smoking, advanced pathology stage (Stage II−IV), lower historical differential grade, solid historical subtype, age ≥ 50 years old, tumor size ≥2 cm, ki67 expression proportion ≥ 10%, family history of cancers, BMI ≥24, lymph node metastasis, as well as the presence of STAS and vascular invasion.We identified 45 genes significantly associated with higher clinical risk factors (Figure 1B).Frequent mutations in TP53 and SPTA1 were observed in the smoking population, while KEAP1 and TAF1 mutations were associated with a higher BMI index.Patients with STAS pathology exhibited a higher frequency of TP53, MTOR, CTNNB1, BARD1, BMPR2, DDX3X, MCL1, and SUFU mutations.Similarly, patients carrying TP53, ERBB2, PDGFRB, POLD1, and PTPRT mutations were likelier to display the pathology feature of vascular invasion.TP53, LRP1B, SETD2, RAD50, ALK, CTNNB1, and BRCA2 were frequently observed in patients with larger tumors.The expression of ki67 in tumor tissues strongly correlated with mutations in TP53, LRP1B, SETD2, ALK, KMT2C, EP300, KMT2B, and PDGFRA.Lymph node metastasis was closely related to mutations in TP53, SETD2, ERCC2, FANCF, and FLI1, while patients with advanced pathology stage exhibited a higher frequency of TP53, LRP1B, CDKN2A, GRIN2A, DICER1, BRIP1, NOTCH1, AND PALB2 mutations.Furthermore, MTOR has a higher frequency in solid tumor subtypes.These findings suggest a potential association between gene mutations and various clinical pathology features, although further experimental evidence is required to confirm the underlying biological mechanisms.We also retrieved the pathogenic significance of these CCRG genes from the CGC database.Of these, 18 were not reported, and the other 27 played essential roles as oncogenes (12/27), fusion genes (9/27), and tumor suppressor genes (17/27) (Table S9).
We combined the genes of the top 20 frequently mutant genes, 34 driver genes, and 45 CCRGs as essential genes in LUAD patients from the DPH cohort.Seven of these 72 critical genes (ERBB2, KRAS, LRP1B, RB1, SETD2, SPAT1, and TP53) exhibited all three above labels (Table S9).Among the seven genes, SPAT1 was never reported as a driver gene in the CGC database, and LRP1B and SETD2 were reported as driver genes for the other cancer type.
Mutation signature analysis was performed using somatic mutation data of the DPH cohort.We observed four significantly enriched signatures (Figure 2C).The most prevalent signatures were cosmic4 (smoking signature, cosine-similarity = 0.736) and cosmic15 (defective DNA mismatch repair signature, cosine-similarity = 0.664).Furthermore, we conducted a comparison of smoking statuses between the "smoking" signature group and the remaining patients.It was observed that the percentage of smokers within the "smoking" signature group was significantly higher compared to the non-smoking population (p < 0.001, Figure 2D).
Overall, alterations in the RTK-RAS signaling pathway served as the primary oncogenic molecular driver of LUAD patients in the DPH cohort.At the same time, smoking acted as the primary clinical risk characteristic.

| Development and validation of Multiple Parameter Gene Mutation (MPGM) risk model
In the present study, we investigated the significant gene alterations observed in the DPH cohort, elucidating the intricate relationship between clinical pathology and the oncogenic implications of gene mutations.Now, we aim to explore the survival prognosis roles of these gene alterations.Due to a lack of adequate follow-up survival data for the DPH cohort, we used the EAS cohort to develop the risk model.We validated the predictive performance utilizing the MSK-LUAD cohort.
Seventy-two critical genes (top 20 frequently mutant genes, 34 driver genes, and 45 CCRGs, Table S9) and 10 oncogenic pathway gene alterations (at least one pathway gene mutated) in the DPH cohort are defined as MPGM  for LUAD patients, which were used to screen prognosisrelated signature genes during the initial stage.Genes with mutation frequency = 0% in the EAS cohort were excluded (Figure S2).Finally, 56 essential genes and 10 oncogenic pathway gene alterations were selected for univariate and multivariable Cox regression analysis.Finally, five MPGMs with p < 0.05 in univariate and multivariable Cox regression analysis (Table 1).The candidate MPGMs and the coefficients were BRCA2 (1.250), ALK (1.883), BRAF (1.083), EGFR (−0.547), and PDGFRA (0.973).Therefore, the MPGM risk scores were calculated using the following formula: MPGM score = ∑ (mutationstatus × coefficient) .The mutation status was defined as follows: "0" denoted the absence of any mutations, while "1" indicated the presence of mutations.The cutoff was defined as the median value of the MPGM score in the EAS cohort.
Stratifying 299 patients with LUAD from the EAS cohort into two groups based on the MPGM risk model resulted in MPGM-high (N = 26) and MPGM-low (N = 273) groups (Figure 3A).Survival analysis unveiled a notably favorable prognosis within the MPGM-low group (p < 0.0001, HR = 3.7, 95% CI [2.2, 6.21], median overall survival [mOS]: not reached vs. 29 months).Demonstrating its robustness in outcome prediction, the MPGM risk model achieved the highest AUC of 0.754 for predicting 10-year survival (Figure 3B).Interestingly, we observed a decreased AUC when integrating the MPGM risk model with two significant pathway alterations in univariate Cox regression analysis: NRF2 and PI3K pathways (Figure 3C).This finding suggested that combining these specific pathway alterations may affect the predictive accuracy of the MPGM risk model.
Moreover, we compare the predicting performance of the MPGM risk model and the model combined with other clinical factors, including age, gender, smoking, stage, histological, and TMB.MPGM risk model combined smoking factor achieved an optimal AUC value of 0.748 (Figure 3D).Given the satisfactory predictive performance achieved by integrating the MPGM risk model and clinical characteristics, we performed the survival analysis by combing the MPGM model and clinical factors.In the MPGM risk model, when age was combined, the MPGMhigh group consistently demonstrated superior survival compared to the MPGM-low group, regardless of whether the individuals were above or below the age of 64.When the patients in the MPGM-low group were younger than 64 years old, they achieved the best OS (Figure S3A).
Additionally, this trend persisted irrespective of the TMB value being high or low (Figure S3D) and regardless of smoking status (Figure S3E).These findings highlight the independent predictive capacity of the MPGM risk model.When the MPGM model was combined with the stage (Figure S3B), gender (Figure S3C), and historical differentiation grade (Figure S3F), it could further facilitate a more comprehensive prognostic stratification of patients, thereby aiding clinicians in making personalized clinical decisions.
To validate the predictive efficacy of the MPGM risk model, we utilized the MSK-LUAD cohort, which provided genomic alterations and survival data.Patients in the MSK-LUAD cohort were stratified into MPGM-high and MPGMlow groups based on the median cutoff value (Figure 3E).Notably, patients in the MPGM-how group exhibited a slightly prolonged overall survival (OS) benefit (N = 233, p = 0.0055, HR = 1.56, 95% CI [1.14, 2.51], mOS = 49 months) compared to those in the MPGM-high group (N = 161, mOS = 34 months).The AUC in the validation cohort was 0.619 (Figure 3F).Moreover, we tried to perform the validation in the DPH cohort.By the end of the latest follow-up visit on 2nd June 2023, the treatment and PFS information of 55 patients from the DPH cohort were collected for validation analysis.As shown in Figure 3G, patients could also be divided into MPGM-high and MPGM-low groups despite no significant PFS difference (p = 0.47).
Furthermore, we conducted a comparative analysis to elucidate the HLA variability (Figure S4A) and the proportion of distinctive clinical characteristics between the MPGM-high and MPGM-how groups (Figure S4B-O).According to the results, Patients in the MPGM-high group had a significantly high proportion of low differentiation (p < 0.01, Figure S4E), expression of ki67 > 10% (p = 0.018, Figure S4I), and smokers (p = 0.02, Figure S4N).
Moreover, a comprehensive comparison was conducted to evaluate the disparities in 10 oncogenic pathway alterations and TMB levels between the MPGM-high and MPGM-low groups in EAS and DPH cohorts.The alteration frequency of 10 oncogenic pathways was higher in the MPGM-low group than in the MPGM-high group (Figure S5A).However, a significantly higher TMB level was observed in the MPGM-high group than in the MPGM-low group, both in the DPH cohort (Figure S5B) and the EAS cohort (Figure S5C).This observation was primarily attributed to the prevalence of sporadic mutations in the MPGM-low group.In contrast, the MPGM-high group predominantly exhibits specific carcinogenic gene alterations, which led to higher hazard risk.The detailed gene mutations in the 10 oncogenic signaling pathways of EAS and DPH cohorts are plotted in Figure S6.To assess the therapy decision, we analyzed the targetable gene mutation site referred to in the National Comprehensive Cancer Network guideline.The results showed ERBB2_20ins, KRAS_G12/13X, and BRAF_V600E mutations were more frequently mutated in the MPGM-high group, and EGFR mutations, including EGFR_L858R, EGFR_19del, EGFR_20ins, and EGFR_ T790M, EGFR_S768I, EGFR_L861Q, and EGFR_G719A have occurred more in the MPGM-how group (Figure S5D).We conducted an RNA fusion analysis and identified oncogene fusions, including KIF5B-RET, CCDC6-RET, EZR-ROS1, EML4-ALK, and CD74-NRG1, in eight patients from the DPH cohort commonly.Notably, all these eight patients belonged to the MPGM-high group (Table S10).The analysis of copy number variations in the DPH cohort revealed a notable gene amplification (Table S11).Furthermore, we performed a Gene Ontology (GO) enrichment analysis on the five model genes (Figure S7), revealing a significant enrichment of the regulation of the MAPK cascade.This finding highlights the potential molecular mechanism underlying the MPGM risk model.

| Prognosis stratification and immune infiltrates analysis for different LUAD subgroups using the MPGM risk model
In addition, we examined the clinical applicability of the MPGM risk model in different LUAD subgroups of the EAS cohort.First, we utilized the MPGM model to stratify early-stage LUAD patients (stage I-II, N = 191), identifying 12 individuals in the MPGM-high group who require closer follow-up (p = 0.011, HR = 2.9; 95% CI [1.22,  6.87], mOS = 91 months vs. not reached) (Figure 5A).For advanced-stage patients (stage III-IV, N = 106), 92 individuals were classified as MPGM-low patients using the MPGM risk model (p = 0.00015, HR = 3.34; 95% CI [1.73, 6.47], mOS = 50.9 vs. 18.1 months), which needed comprehensive assessments and avoiding overtreatment (Figure 5B).These findings underscore the clinical advantage of the MPGM model in guiding risk-based decision-making and optimizing patient management.Subsequently, in the EAS cohort, we identified 50 patients with LUAD who received tyrosine kinase inhibitors (TKIs) treatment.Notably, one patient was classified as an MPGM-high individual using the MPGM risk model, indicating the potential of the MPGM risk model in identifying treatment hazards (Figure 5C).Following the standard treatment guidelines, patients with EGFR mutations are eligible for corresponding TKI therapy.On the contrary, treatment options are limited for patients without EGFR mutations, and conventional chemotherapy remains the primary choice.We investigated whether the MPGM model could offer additional prognostic stratification for patients without EGFR mutations (N = 154).
Furthermore, we investigated whether the MPGM model could be used to identify further patients who are suitable or unsuitable for chemotherapy among those who had undergone chemotherapy (N = 68).Five patients with a median OS of 18.1 months were classified into the MPGM-high group, showing a higher risk when receiving chemotherapy treatment (p < 0.0001, Figure 5D).These above findings highlighted the clinical utility of the MPGM risk model in guiding clinical decision-making regarding drug selection and potentially avoiding adverse outcomes.
Finally, we identified 169 LUAD patients with corresponding RNA-seq data in the EAS cohort.Survival analysis based on the MPGM model revealed significant differences in OS between the two groups (p = 0.0004, Figure 5F).Therefore, we further investigated these two groups' immune microenvironment features based on the RNA-seq data of 169 LUAD patients in the EAS cohort.
We compared the proportions of immune infiltrating cells in the MPGM-high and MPGM-low groups using various immune infiltration analysis methods, including EPIC, Quantiseq, Timer, Xcell, Estimate, Cibersort, and MCPcounter (Figure S8).The ESTIMATE method assessed the proportion of non-tumor components, including stromal and immune cells.The MPGM-high group exhibited a higher proportion of tumor purity and a lower ESTIMATEScore, indicating a higher degree of tumor invasiveness.Conversely, the MPGM-low group showed significant enrichment of dendritic cells (DC) and monocytic lineage cells and a higher MicroenvironmentScore and ImmuneScore (Figure S8), suggesting an enhanced immune anti-tumor activity.

| DISCUSSION
The latest advance of phase 3 trials ADAURA study (NCT02511106) confirmed Adjuvant osimertinib could get a significant 85% 5-year OS benefit among patients with completely resected, EGFR-mutated, stage IB to IIIA NSCLC, showing the powerful therapy efficiency of targeted therapy. 29Recent clinical trial research on perioperative immunotherapy in early-stage NSCLC, including phase 3 trials such as CheckMate 816 and KEYNOTE-671, has yielded remarkable advancements.These findings signify a paradigm shift in research focus from exploring survival benefits solely in advanced metastatic NSCLC to encompassing the early and mid-stage NSCLC patients and establishing a new perioperative treatment pattern. 30,31Despite significant advancements in clinical trials for targeted therapy and immunotherapy, leading to improved outcomes in LUAD patients, a subset of patients cannot undergo surgery or lack targetable biomarkers.
Therefore, it is imperative to explore effective biomarkers to stratify patients and identify additional targetable biomarkers. 32Consequently, our objective is to identify tumor-specific informative genes based on mutation In this study, we retrospectively collected a moderatesize LUAD cohort and profiled the somatic altered features based on the genomic data of multiple gene sequencing panels.The alteration of the RTK-RAS pathway and smoking are identified as the main carcinogenic factors.We first explore the association between somatic tumor mutations and clinical risk factors.A total of 45 CCRGs are significantly closely related to the high-risk clinicopathological characteristics, including smoking, advanced pathology stage (stage II−IV), lower historical differential grade, solid historical subtype, age ≥ 50 years old, tumor size ≥2 cm, ki67 expression proportion ≥ 10%, family history of cancers, BMI index ≥24, lymph node metastasis, as well as the presence of STAS and vascular invasion.The TP53 mutation is significantly associated with tobacco consumption. 33In addition, TP53 mutations are substantially higher in patients with other high-risk clinical factors, including STAS+, high ki67 expression, larger tumor size, and lymph node metastasis.The TP53 mutation in tumors could result in the loss of tumorsuppressing function, promoting tumor proliferation and metastasis. 34Regarding STAS, Ye et al. conducted a study to investigate the molecular characteristics distinguishing NSCLC patients with STAS+ and STAS−.Their findings revealed significant TP53 mutation and ALK fusion occurrences in STAS+ NSCLC patients. 35In our study, we also observed a correlation between STAS+ and several oncogene mutations, namely MTOR and CTNNB1, as well as tumor suppressor genes, including BARD1, DDX3X, and SUFU.However, recurrent mutations of EGFR and KRAS were not found to be associated with STAS+, which is consistent with previous studies. 36,37KEAP1 is a tumor suppressor gene in NSCLC, which is associated with a higher BMI index of LUAD patients in our study.There is no reported relationship between KEAP1 and BMI index.Still, we found KEAP1 mutation and BMI index are proven to be related to survival benefits from immunotherapy in NSCLC, which provides new insight for further study. 38,39Significant associations were observed between vascular invasion and mutations in oncogenes ERBB2 and PDGFRA, as well as tumor suppressor genes POLD1 and PTPRT.PDGFRA is an angiogenesis-related gene, and the mRNA expression of PDGFRA and vascular invasion are positively related to HIF-1alpha in hepatocellular carcinoma (HCC), which could promote tumor inflammation. 40Furthermore, mutations in tumor suppressor genes LRP1B, SETD2, and TP53 are strongly associated with larger tumor size, elevated ki67 expression, lymph node metastasis, and advanced pathological stage in patients with LUAD, implying a significant association with proliferation and metastasis of LUAD.Kadara et al. discovered a correlation between SETD2 mutations and the progression as well as poor survival of patients with LUAD. 41Additionally, LRP1B mutations have been found to significantly impact the Cell Cycle and Antigen Processing and Presentation pathways, which play crucial roles in tumorigenesis.Notably, NSCLC patients with LRP1B mutations exhibit high TMB values and derive incredible survival benefits from immunotherapy. 42hese CCRG genes provide new insight into the molecular mechanisms underlying the occurrence of high-risk clinical factors, thus providing valuable avenues for further investigation in this area of research.
In addition to identifying gene mutations significantly associated with these clinical features, we also identified novel driver genes, including SPAT1, ANKRD11, ERCC3, RAD50, WRN, BLM, WISP3, CDK8, PAK3, and WEE1, which have not been reported in the Cancer Gene Census (CGC) database.Further experimental validation is required to elucidate their biological mechanisms in LUAD.
To assess the prognostic significance of LUAD-related genomic alterations, including CCRGs, driver genes, and gene mutations in oncogene pathways, we developed an MPGM risk model using a multivariable Cox regression algorithm in a publicly available Chinese LUAD cohort (EAS cohort).Patients stratified into two groups, MPGMhigh and MPGM-low, by the risk model, showed inferior OS compared to those in the MPGM-low group, as validated in an independent cohort (MSK-LUAD cohort).Subgroup analysis shows the robust predicting performance of the MPGM model to stratify the LUAD patients.The MPGM risk model could effectively distinguish LUAD patients of the EAS cohort into MPGM-high and MPGM-low groups, irrespective of EGFR mutation status, treatment type (chemotherapy or targeted therapy), and disease stage (early or advanced).Remarkable survival differences were observed between the two risk groups, demonstrating the model's clinically solid applicability and stable prognostic performance.
Although statistical significance was not achieved in the DPH cohort, an analysis of 54 patients with complete PFS follow-up information revealed that those in the MPGM-low group exhibited better PFS than those in the MPGM-high group.Additionally, several characteristics were observed in the MPGM-high group, including higher ki67 expression, smoking history, lower differentiation grade, high TMB, and specific gene mutations in the RTK-RAS pathway.Significant differences were also observed in targetable gene mutation sites for targeted therapy.The MPGM-high group showed a higher prevalence mutation of ERBB2_20ins, KRAS_G12/13X, and BRAF_V600E mutations, while the MPGM-low group had a higher occurrence of EGFR_L858R, EGFR_19del, EGFR_20ins, and EGFR_T790M, EGFR_S768I, EGFR_L861Q, and EGFR_G719A mutations.Interestingly, it was observed that all patients harboring detected oncogenic fusion genes, including RET, ROS1, ALK, and NRG1, 12 were consistently classified into the MPGM-high group.These findings provide compelling evidence that patients classified as MPGH-high risk exhibit a higher risk hazard, both in clinical and molecular characteristics.Moreover, these results emphasize the model's substantial clinical utility and robust prognostic capacity in real-world settings.Due to the favorable clinical application of the MPGM risk model and improved prognosis assessment performance when combined with the clinicopathology feature, we establish a nomogram based on the Cox regression model, which is pretty valuable for the clinics to precisely predict the survival status using the clinical information at baseline and MPGM risk model according to the molecular biological characteristics, including three CCRGs (BRCA2, ALK, and PDGFRA) and two driver genes (BRAF, EGFR).These five genes all play essential roles in LUAD.
Analyzing immune infiltration differences using RNAseq data, we sought to understand the reasons behind the disparate survival outcomes between the MPGM-high and MPGM-low groups.It was observed that patients in the MPGM-high group exhibited higher tumor purity and lower ESTIMATE score, indicative of increased tumor cell infiltration and decreased immune cell enrichment. 25Patients in the MPGM-low group showed significant enrichment of dendritic cells (DC) and monocytic lineage cells and a higher MicroenvironmentScore and ImmuneScore, demonstrating robust immune activation and anti-tumor capabilities.A previous study has confirmed that tumor-associated DC is related to increased survival outcomes in lung cancer due to an increased anti-tumor T-cell response. 43,44The monocytic lineage cells could differentiate into macrophage cells, affecting tumor growth and survival outcomes. 45The monocyte lineage and macrophage cells demonstrate a significant interaction in LUAD, correlating with unfavorable survival outcomes and high-grade tumor subtypes. 46But, this disadvantage may be negated by heightened immune activity in the MPGM-low group.
In summary, the significantly higher anti-tumor capability and better survival of the MPGM-low group can mainly be attributed to the infiltration of immune cells and DCs.In contrast, the MPGM-high group exhibits extensive invasion and infiltration of tumor cells, resulting in a markedly suppressive tumor microenvironment and compromised survival prognosis.These findings reinforce the reliable and robust predictive performance of our model, highlighting its credibility and resilience.
We will apply the MPGH risk model to our clinical practice.For these 55 LUAD patients who keep a close follow-up in the DPH cohort, patients in the MPGM-high group will be paid extra attention to the disease progression and gene alterations during the subsequent follow-up.In contrast, patients in the MPGM-low group will undergo the standard therapy method and avoid overtreatment.
Our study has several limitations.First, the DPH cohort predominantly comprises early-stage LUAD patients, resulting in a loss of follow-up for many individuals.As a result, only 55 patients have adhered to regular follow-up visits.Second, in the DPH cohort, the MPGM risk model did not demonstrate a statistically significant difference between the MPGM-high and MPGM-low groups.This lack of significance can be attributed to the relatively short follow-up period and the fact that only a minority of patients experienced disease progression.Due to the limited number of samples available for follow-up, we were unable to conduct further analysis on the impact of gene mutation status and treatment modalities on prognosis.Nevertheless, we remain committed to persistently conducting follow-ups for this specific group of individuals.
In addition, genetic intratumor heterogeneity is closely linked to tumor occurrence, development, and overall outcome.Mutant-allele fraction (MAF) serves as a valuable metric for inferring tumor heterogeneity and purity, and its level has been implicated in influencing cancer prognosis, particularly in NSCLC. 47iven the potential impact of MAF on tumor behavior and patient prognosis, its role in affecting the performance of predictive models like ours warrants further exploration.As we move forward with our research endeavors, we recognize the importance of integrating MAF into our analytical framework to comprehensively understand its mechanisms of influence and its implications for patient care and management.

| CONCLUSIONS
In addition to exploring therapeutic targets for targeted therapy opportunities, it is essential to consider patient stratification based on prognosis.Beyond utilizing clinical and pathological features like TNM staging and histological subtyping, leveraging patient-specific molecular alterations is crucial for precise risk stratification of heterogeneous patients, enabling personalized diagnostic and therapeutic assessments.In this study, we comprehensively investigated the molecular characteristics of LUAD and identified several novel putative driver genes and genes associated with clinical pathological characteristics.These findings provide new insight for studying molecular biological features and drug development for patients with LUAD.By assessing the risk level of LUAD patients using the MPGM risk model, we can make informed decisions regarding treatment selection.For patients classified into the MPGM-low group, a thorough evaluation will determine if they can be spared from excessive treatments, reducing unnecessary suffering.Conversely, the MPGM-high group LUAD patients in advanced stages may benefit from more comprehensive evaluations, allowing for the exploration of more effective treatment options, such as immunotherapy.By utilizing patient-specific genetic alteration data, we can not only identify actionable targets for targeted therapy but also predict their survival prognosis.Ultimately, this risk model has the potential to improve patient outcomes by providing personalized treatment strategies.

F I G U R E 1
The comprehensive clinical and somatic mutation characteristics of LUAD in the DPH cohort.(A) The mutation landscape of somatic mutations.The upper panel annotations showed the TMB level, the middle panel showed the clinical features, and the lower panel showed the gene mutations, which were colored according to the variant type.The correct label described the percentage of mutation frequency (blue bar) and -LogFDR of significant driver genes (yellow bar).(B) The heatmap showed the correlation between 45 CCRGs and clinical risk factors.CCRGs: Clinical Characteristic-Related Genes.*p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001.

F
I G U R E 2 The Oncogenic signaling pathway alterations and mutation signature enrichment analysis regulated by somatic mutations in the DPH cohort.(A) The modifications of 10 oncogenic pathways are based on the mutation data.(B) The gene alterations of the RTK-RAS pathway.(C) Mutation signature enrichment analysis.(D) The comparison of smoking status between the "smoking" signature group and the rest.T A B L E 1 Univariable and multivariable analyses of OS of MPGMs of patients in the EAS cohort.

F I G U R E 3
Develop and validate the MPGM risk model.(A) Kaplan-Meier curves depicting OS for patients with MPGM-high and MPGM-low scores in the EAS cohort.(B) ROC analysis showed the AUCs of the MPGM risk model in predicting the OS of patients in the EAS cohort.(C) ROC analysis showed the AUCs of the MPGM risk model and the model combined with NRF2 and PI3K pathway alterations in predicting the OS of patients in the EAS cohort.(D) ROC analysis showed the AUCs of the MPG risk model and the model combined with clinical factors in predicting the OS of patients in the EAS cohort.(E) Kaplan-Meier curves of OS for patients with MPGMhigh and MPGM-low scores in the MSK-LUAD cohort.(F) ROC analysis showed the AUCs of the MPGM risk model in predicting the OS of patients in the MSK-LUAD cohort.(G) Kaplan-Meier curves of PFS for 55 patients with MPGM-high and MPGM-low scores in the DPH cohort.ROC: receiver operating characteristic; AUC: areas under the ROC curve.F I G U R E 4 Development of a nomogram model.(A) The forest plot showed the hazard ratio (HR) for OS of the MPGM risk model and cliniophathology factors in the EAS cohort.(B) A nomogram based on histological grade, stage, smoking status, sex, age, TMB, and MPGM risk model for 1-, 3-, and 5-year OS predictions.(C) Calibration curves for testing the agreement between 1-, 3-, and 5-year predicted OS and actual observations in the EAS cohort.

F I G U R E 5
Based on the MPGM risk model, stratified survival analysis was performed according to different clinical application scenarios on the EAS cohort samples.(A) Kaplan-Meier curves of OS for early-stage LUAD patients (stage I-II, N = 191) based MPGM risk model.(B) Kaplan-Meier curves of OS for advanced-stage patients (stage III-IV, N = 106) based MPGM risk model.(C) Kaplan-Meier curves of OS for LUAD patients who received TKIs treatment based MPGM risk model.(D) Kaplan-Meier curves of OS for LUAD patients who undergone chemotherapy (N = 68) based MPGM risk model.(E) Kaplan-Meier curves of OS for LUAD patients without EGFR mutations (N = 154) based MPGM risk model.(F) Kaplan-Meier curves of OS for 169 LUAD patients with corresponding RNA-seq data based MPGM risk model.| 13 of 16 LI et al. characteristics, facilitating risk stratification for patients, particularly with limited treatment options.